Tagging and Glossing Sesotho

نویسندگان

  • Mark Johnson
  • Katherine Demuth
چکیده

This paper describes a system for morphological tagging and gloss-ing of Sesotho, a southern Bantu language. Sesotho has a rich agglu-tinative morphology, and morphemes cannot be disambiguated on the basis of the bigram or trigram statistics that work so well for languages like English. Our system estimates a simple PCFG for Sesotho clauses from a small hand-annotated corpus in an unsupervised manner. It uses this PCFG and a small set of hand-coded constraints to produce a ranked list of possible tags and corresponding glosses for untagged clauses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic interlinear glossing as two-level sequence classification

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...

متن کامل

The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning

In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...

متن کامل

Empirical measurements on a Sesotho tone labeling algorithm

This article discusses the empirical assessments employed on two versions of a Sesotho tone labeling algorithm. This algorithm uses linguistically-defined Sesotho tonal rules to predict the tone labels on the syllables of Sesotho words. The two versions differed in the number of tonal rules that they employ as well the lexical categories that the tone rules apply to. Both versions were tested o...

متن کامل

MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization

We describe the MADA+TOKAN toolkit, a versatile and freely available system that can derive extensive morphological and contextual information from raw Arabic text, and then use this information for a multitude of crucial NLP tasks. Applications include high-accuracy part-of-speech tagging, diacritization, lemmatization, disambiguation, stemming, and glossing. MADA operates by examining a list ...

متن کامل

Utility of the Koppitz norms for the Bender Gestalt Test performance of a group of Sesotho-speaking children.

OBJECTIVE This study investigated the utility of the Koppitz administration, scoring and norms for the Bender Gestalt Test (BGT) as a neurocognitive screening instrument for Sesotho-speaking children. METHOD The BGT protocols of 671 Sesotho-speaking children between the ages of seven and nine were reviewed. Data pertaining to socioeconomic status were also gathered for 360 of the participants...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007